Method and system for extraction of table data from documents for robotic process automation

ABSTRACT

Improved techniques to access content from documents in an automated fashion. The improved techniques permit content of tables within documents to be retrieved and then used by computer systems operating various software programs (e.g., application programs). Consequently, Robotic Process Automation (RPA) systems are able to accurately understand the content of tables within documents so that users, application programs and/or software robots can operate on the documents with increased reliability and flexibility. The documents being received and processed can be electronic images of documents. For example, the documents can be business transaction documents which include tables, such as purchase orders, invoices, delivery receipts, bills of lading, etc.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to: (i) U.S. Patent ProvisionalApplication No. 63/087,844, filed Oct. 5, 2020, and entitled “METHOD ANDSYSTEM FOR EXTRACTION OF TABLE DATA FROM DOCUMENTS FOR ROBOTIC PROCESSAUTOMATION,” which is hereby incorporated herein by reference; (ii) U.S.Patent Provisional Application No. 63/087,851, filed Oct. 5, 2020, andentitled “METHOD AND SYSTEM FOR EXTRACTION OF DATA FROM DOCUMENTS FORROBOTIC PROCESS AUTOMATION,” which is hereby incorporated herein byreference; and (iii) U.S. Patent Provisional Application No. 63/087,847,filed Oct. 5, 2020, and entitled “MACHINED LEARNING SUPPORTING DOCUMENTDATA EXTRACTION,” which is hereby incorporated herein by reference.

This application is related to: (i) U.S. patent application Ser. No.______ [Att.Dkt.No. 108-P004/20022], filed Jan. ______, 2021, andentitled “METHOD AND SYSTEM FOR EXTRACTION OF DATA FROM DOCUMENTS FORROBOTIC PROCESS AUTOMATION,” which is hereby incorporated herein byreference; and (ii) U.S. patent application Ser. No. ______ [Att.Dkt.No.108-P005/20023], filed Jan. ______, 2021, and entitled “MACHINEDLEARNING SUPPORTING DOCUMENT DATA EXTRACTION,” which is herebyincorporated herein by reference.

BACKGROUND OF THE INVENTION

Robotic Process Automation (RPA) systems enable automation of repetitiveand manually intensive computer-based tasks. In an RPA system, computersoftware, namely a software robot (often referred to as a “bot”), maymimic the actions of a human being in order to perform variouscomputer-based tasks. For instance, an RPA system can be used tointeract with one or more software applications through user interfaces,as a human being would do. Therefore, RPA systems typically do not needto be integrated with existing software applications at a programminglevel, thereby eliminating the difficulties inherent to integration.Advantageously, RPA systems permit the automation of application levelrepetitive tasks via software robots that are coded to repeatedly andaccurately perform the repetitive task.

In the case of documents that are to be accessed and processed by one ormore software applications being used by software agents, the documentscan be analyzed from images of the documents. The document image cancontain text which can be obtained by Optical Character Recognition(OCR) processing. While OCR processing of documents can recognize textcontained therein, such process is not well suited to capture data fromtables contained in the documents, such as invoices, purchase orders, ormore generally tables.

Therefore, there is a need for improved approaches to understand andextract data from tables provided within documents such that RPA systemsare able to accurately understand the content of the documents so thatsoftware robots can operate on the documents with increased reliabilityand flexibility.

SUMMARY

Embodiments disclosed herein can provide for extraction of data fromdocuments, namely, images of documents. The extraction processing can behierarchical, such as being performed in multiple levels (i.e.,multi-leveled). At an upper level, numerous different objects within adocument can be detected along with positional data for the objects andcan be categorized based on a type of object. Then, at lower levels, thedifferent objects can be processed differently depending on the type ofobject. As a result, data extraction from the document can be performedwith greater reliability and precision.

Embodiments disclosed herein can concern improved techniques to accesscontent from documents in an automated fashion. The improved techniquespermit content of tables within documents to be retrieved and then usedby computer systems operating various software programs (e.g.,application programs). Consequently, RPA systems are able to accuratelyunderstand the content of tables within documents so that users,application programs and/or software robots can operate on the documentswith increased reliability and flexibility. The documents being receivedand processed can be electronic images of documents.

The invention can be implemented in numerous ways, including as amethod, system, device, apparatus (including computer readable mediumand graphical user interface). Several embodiments of the invention arediscussed below.

As a computer-implemented method for extracting data from an image of adocument, one embodiment can, for example, include at least: detecting aplurality of objects that are detected in the image of the document, theplurality of objects detected including at least a table, table headerand table header elements from the detected object; identifying columnswithin the table; selecting at least one of the columns; identifyingrows within the table based on positions of data items within theselected column; identifying cells within the table at the intersectionof the identified rows and the identified columns; and extractingcontent from the identified cells.

As a computer-implemented method for extracting data from an image of adocument, one embodiment can, for example, include at least: receivingthe image for the document; detecting objects within the document image;identifying a table, table header and table header elements from thedetected objects; identifying columns within the table; selecting atleast one of the columns; identifying rows within the table based onpositions of data items within the selected column; identifying cellswithin the table at the intersection of the identified rows and theidentified columns; and extracting content from the identified cells.

As a non-transitory computer readable medium including at least computerprogram code tangibly stored thereon for extracting data from an imageof a document, one embodiment can, for example, include at least:computer program code for detecting a plurality of objects that aredetected in the image of the document, the plurality of objects detectedincluding at least a table, table header and table header elements fromthe detected object; computer program code for identifying columnswithin the table; computer program code for selecting at least one ofthe columns; computer program code for identifying rows within the tablebased on positions of data items within the selected column; computerprogram code for identifying cells within the table at the intersectionof the identified rows and the identified columns; and computer programcode for extracting content from the identified cells.

As a non-transitory computer readable medium including at least computerprogram code tangibly stored thereon for extracting data from an imageof a document, one embodiment can, for example, include at least:computer program code for identifying a table and table header elementsfrom objects detected within the image of the document, the table headerelements are part of a table header for the table; computer program codefor identifying columns within the table; computer program code forselecting at least one of the columns; computer program code foridentifying rows within the table based on positions of data itemswithin the selected column; computer program code for identifying cellswithin the table at the intersection of the identified rows and theidentified columns; and computer program code for extracting contentfrom the identified cells.

Other aspects and advantages of the invention will become apparent fromthe following detailed description taken in conjunction with theaccompanying drawings which illustrate, by way of example, theprinciples of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be readily understood by the following detaileddescription in conjunction with the accompanying drawings, wherein likereference numerals designate like elements, and in which:

FIG. 1 is a block diagram of a programmatic automation environmentaccording to one embodiment.

FIG. 2 is a block diagram of data extraction system according to oneembodiment.

FIGS. 3A and 3B are flow diagrams of a data extraction process accordingto one embodiment.

FIGS. 4A and 4B are flow diagrams of a table data extraction processaccording to another embodiment.

FIG. 5A-5D are flow diagrams of a table data extraction processaccording to still another embodiment.

FIGS. 6A-6G illustrate an exemplary document during various phases ofdata extraction from a table within the exemplary document, according toone or more embodiment.

FIG. 7 is a block diagram of a Robotic Process Automation (RPA) systemaccording to one embodiment.

FIG. 8 is a block diagram of a generalized runtime environment for botsin accordance with another embodiment of the RPA system illustrated inFIG. 7.

FIG. 9 illustrates a block diagram of yet another embodiment of the RPAsystem of FIG. 7 configured to provide platform independent sets of taskprocessing instructions for bots.

FIG. 10 is a block diagram illustrating details of one embodiment of thebot compiler illustrated in FIG. 9.

FIG. 11 illustrates a block diagram of an exemplary computingenvironment for an implementation of an RPA system, such as the RPAsystems disclosed herein.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

Embodiments disclosed herein can provide for extraction of data fromdocuments, namely images of documents. The extraction processing can behierarchical or multi-leveled. At an upper level, numerous differentobjects within a document can be detected and categorized along withpositional data for the objects and a type of object. Then, at lowerlevels, the different objects can be processed differently depending onthe type of object. As a result, data extraction from the document canbe performed with greater precision.

Embodiments disclosed herein can concern improved techniques to accesscontent from documents in an automated fashion. The improved techniquespermit content of tables within documents to be retrieved and then usedby computer systems operating various software programs (e.g.,application programs). Consequently, RPA systems are able to accuratelyunderstand the content of tables within documents so that users,application programs and/or software robots can operate on the documentswith increased reliability and flexibility. The documents being receivedand processed can be electronic images of documents. For example, thedocuments can be business transaction documents which include tables,such as purchase orders, invoices, delivery receipts, bills of lading,etc.

Generally speaking, RPA systems use computer software to emulate andintegrate the actions of a human interacting within digital systems. Inan enterprise environment, these RPA systems are often designed toexecute a business process. In some cases, the RPA systems useArtificial Intelligence (AI) and/or other machine learning capabilitiesto handle high-volume, repeatable tasks that previously required humansto perform. The RPA systems support a plurality of software automationprocesses. The RPA systems also provide for creation, configuration,management, execution, monitoring, and performance of softwareautomation processes.

A software automation process can also be referred to as a softwarerobot, software agent, or a bot. A software automation process caninterpret and execute tasks on your behalf. Software automationprocesses are particularly well suited for handling a lot of therepetitive tasks that humans perform every day. Software automationprocesses can perform a task or workflow that they are tasked with onceor many times. As one example, a software automation process can locateand read data in a document, email, file, or other data source. Asanother example, a software automation process can connect with one ormore Enterprise Resource Planning (ERP), Customer Relations Management(CRM), core banking, or other business systems to distribute data whereit needs to be in whatever format is necessary. As another example, asoftware automation process can perform data tasks, such asreformatting, extracting, balancing, error checking, moving, copying,etc. As another example, a software automation process can retrieve datafrom a webpage, application, screen, or window. As still anotherexample, a software automation process can be trigger based on time oran event, and can serve to take files or data sets and move them toanother location, whether it is to a customer, vendor, application,department or storage. These various capabilities can also be used inany combination. As an example of an integrated software automationprocess, the software automation process can start a task or workflowbased on a trigger, such as a file being uploaded to an FTP system. Theintegrated software automation process can then download that file,scrape relevant data from it, upload the relevant data to a database,and then send an email to inform the recipient that the data has beensuccessfully processed.

Embodiments of various aspects of the invention are discussed below withreference to FIGS. 1-11. However, those skilled in the art will readilyappreciate that the detailed description given herein with respect tothese figures is for explanatory purposes as the invention extendsbeyond these limited embodiments.

FIG. 1 is a block diagram of a programmatic automation environment 100according to one embodiment. The programmatic automation environment 100is a computing environment that supports robotic process automation. Thecomputing environment can include or make use of one or more computingdevices. Each of the computing devices can, for example, an electronicdevice having computing capabilities, such as a mobile phone (e.g.,smart phone), tablet computer, desktop computer, portable computer,server computer, and the like.

The programmatic automation environment 100 includes a robotic processautomation system 102 that provides the robotic process automation. Therobotic process automation system 102 supports a plurality of differentrobotic processes, which are denoted software automation processes 104.These software automation processes 104 can also be referred to as“software robots,” “bots” or “software bots.” The robotic processautomation system 102 can create, maintain, execute, and/or monitorsoftware automation processes 104. The robotic process automation system102 can also report status or results of software automation processes104.

On execution of one or more of the software automation processes 104,the software automation processes 104, via robotic process automationsystem 102, can interact with one or more software programs. One suchsoftware program is an extraction program 106. The extraction program106, when operating, typically interacts with one or more documents 108.In some cases, the extraction program 106 is seeking to access documents108 that contain data that is to be extracted and then suitablyprocessed. The documents 108 are typically digital images of documents,and such documents can include text and graphical objects, such one ormore tables. The RPA system 102 can include sophisticated processing andstructures to support the extraction of data from such document images,and in particular extraction of data from tables within the documents.Examples of documents 108 including tables are invoices, purchaseorders, delivery receipts, bill of lading, etc.

When robotic process automation operations are being performed, therobotic process automation system 102 seeks to interact with theextraction program 106. However, since the robotic process automationsystem 102 is not integrated with the extraction program 106, therobotic process automation system 102 requires an ability to understandwhat content is contained in the document 108. For example, the contentbeing presented in the extraction window 108 can pertain to a document,which can include a table 110 within the document. In this regard, therobotic process automation system 102 interacts with the extractionprogram 106 by interacting with the content in the document 108. Bydoing so, the software automation process 104 being carried out via therobotic process automation system 102 can effectively interface with thedocument 108 as would a user, even though no user is involved becausethe actions by the software automation process 104 are programmaticallyperformed. Once the content of the document is captured and understood,the robotic process automation system 102 can perform an actionrequested by the software automation process 104 by inducing action withrespect to the application program 106.

When robotic process automation operations are being performed, therobotic process automation system 102 seeks to interact with theapplication program 112. However, since the robotic process automationsystem 102 is not integrated with the application program 112, therobotic process automation system 102 requires an ability to understandwhat content is being presented in the application window 114. Forexample, the content being presented in the application window 114 canpertain to a document, which can include a table 110 within the document108. In this regard, the robotic process automation system 102 interactswith the application program 112 by interacting with the content in theapplication window 114 corresponding to the application program 112. Thecontent can pertain to a document being displayed in the applicationwindow. By doing so, the software automation process 104 being carriedout via the robotic process automation system 102 can effectivelyinterface with the document being displayed in the application window114 as would a user, even though no user is involved because the actionsby the software automation process 104 are programmatically performed.

In one embodiment, the application program 112 can host the extractionprogram 106. In such case, the robotics process automation system 102can interact with the application program 112 to carry out the softwareautomation process 104, and the application program 112 can interactwith the extraction program 106 as needed.

FIG. 2 is a block diagram of data extraction system 200 according to oneembodiment. The data extraction system 200 receives a document to beprocessed. The document to be processed is an electronic document, suchas an image (PNG, JPEG, etc.) or Portable Document Format (PDF). Thedocument can then be processed to recognize the text within the documentsuch as through use of Optical Character Recognition (OCR) 202. Next,the document can undergo object detection 204. The object detection 204serves to identify various objects within the document as well asidentifying the different classes or types of those objects that havebeen identified within the document. An object localizer and classifier206 can then be used to further process the identified objects dependingupon their classification (or type).

In this particular embodiment illustrated in FIG. 2, the objectlocalizer and classifier 206 serves to classify the detected objectsinto three distinct classes, that is, class A, class B and class C. Thedetected objects that are classified as class A objects are directed bythe object localizer and classifier 206 to class A data extraction 208where data can be extracted from the class A objects. The detectedobjects that are classified as class B objects can be directed by theobject localizer and classifier 206 to class B data extraction 210 wheredata can be extracted from the class B objects. The detected objectsthat are classified as class C objects can be directed by the objectlocalizer and classifier 206 to class C data extraction 212 where datacan be extracted from the class C objects. In this regard, differentdata extraction processing can be provided for different types ofobjects. As a result, more robust and efficient data extraction is ableto be provided separately for different type of objects.

Additionally, the data extraction system 200 includes an aggregator 220.The aggregator 220 is coupled to the class A data extraction 210, theclass B data extraction 214 and the class C data extraction 218 suchthat the extracted data from the various objects (or blocks) of thedocument can be aggregated together to form a document data file that isproduced by the data extraction system 200 and contains all theextracted data for the document.

The classes used by the object localizer and classifier 206 can varywith implementation. These classes are also referred to as blocks orobject blocks. Some exemplary classes for documents include thefollowing: key-value block, key info block, table block, graphic block,etc.

FIGS. 3A and 3B are flow diagrams of a data extraction process 300according to one embodiment. The data extraction process 300 can, forexample, be performed by an extraction program, such as the extractionprogram 106.

The data extraction process 300 can begin with a decision 302 thatdetermines whether a document has been received. When the decision 302determines that a document has not yet been received, the dataextraction process 300 can await receipt of such a document. Once thedecision 302 determines that a document has been received, a decision304 can determine whether text is available from the document. Forexample, if the document is provided as an image, then the text wouldnot be directly available. On the other hand, if the document is avector-based PDF document, the text would normally be available. Whenthe decision 304 determines that text is available from the document,then the text is extracted 308 from the document. Alternatively, whenthe decision 304 determines that text is not available from thedocument, the text within the document can be recognized 306 using OCR.When the decision 304 determines that text is not available from thedocument, the text within the document can be recognized 306 using OCR.Alternatively, when the decision 304 determines that text is availablefrom the document, then the text is extracted 308 from the document.

Following block 308 or block 306 after the text within the document hasbeen obtained, object detection can be performed 310. The objectdetection seeks to detect these one or more objects within the document.In this embodiment, the objects that can be detected include aninformation block, a key-value block, and a table block. However, theinvention is not limited to detection of these particular types ofobjects.

After the object detection has been performed, a decision 312 candetermine whether an information block has been detected within thedocument. When the decision 312 determines that an information block hasbeen detected, then information block processing can be performed 314.After the information block processing has been performed 314, ordirectly after the decision 312 when no information block has beendetected, a decision 316 can determine whether a key-value block hasbeen detected. When the decision 316 determines that a key-value blockhas been detected, key-value block processing can be performed 318.After the key-value block processing has been performed 318, or directlyfollowing the decision 316 when a key-value block is not been detected,a decision 320 can determine whether a table block/object has beendetected. When the decision 320 determines that a table block/object hasbeen detected, table block/object processing can be performed 322. Here,when the document includes a table, there is a table block andassociated table objects. The table block/object processing can processthese components of the table. After the table block/object processinghas been performed 322, or directly following the decision 320 when thetable block/object is not been detected, the data extraction process canaggregate 324 processing results. Here, to the extent that the objectdetection has detected one or more information blocks, key-value blocksand/or table blocks/objects, the results from the processing thereof canbe aggregated 324. The result of the aggregation 324 can be provided ina document data file. Following the aggregation 324, the data within thetable provided in the document has been extracted and thus the dataextraction and process 300 can end.

FIGS. 4A and 4B are flow diagrams of a table data extraction process 400according to another embodiment. The table data extraction process 400can, for example, be performed by an extraction program, such as theextraction program 106 illustrated in FIG. 1.

The table data extraction process 400 receives 402 into a document imagefor a document that includes a table. Next, the document image can beprocessed to detect 404 objects within the document image. After theobjects have been detected 404, a table, a table header and table headerelements can be identified 406 from the detected objects. Next, columnsfor the table can be identified 408 using at least one or more of theobjects that have been detected 404.

Thereafter, one or more of the identified columns can be selected 410based on the table header elements. Next, row anchor objects can bedetermined 412. Then, a decision 414 can determine whether row anchorobjects have been detected with confidence. When the decision 414determines that row anchor objects have been detected with confidence,rows for the table can be identified 416 based on positions of the rowanchor objects. Then, content can be extracted 418 from each resultingcell within the table. The cells are defined by the intersection ofcolumns and rows. Following the block 418, since the columns and rows ofthe table were detected and content from its resulting cells has beenextracted, the table data extraction process 400 can end. Alternatively,if the decision 414 determines that the row anchor objects have not beendetected with confidence, the user feedback can be obtained 420 toassist with extraction of content from the table within the document.

FIG. 5A-5D are flow diagrams of a table data extraction process 500according to still another embodiment. The table data extraction process500 can, for example, be performed by an extraction program, such as theextraction program 106 illustrated in FIG. 1.

The table data extraction process 500 receives 502 into a document imagefor a document that includes a table. Next, the document image can beprocessed to detect 504 objects within the document image. After theobjects have been detected 504, a table, a table header and table headerelements can be identified 506 from the detected objects. Next, columnsfor the table can be identified 508 using at least one or more of theobjects that have been detected 504.

Thereafter, one or more of the identified columns can be selected 510based on the table header elements. Next, a decision 512 determineswhether the table header element is approximately pertaining to aquantity. Here, the table header element can be “quantity” or an aliastherefor, such as “amount.” When the decision 512 determines that thetable header element is corresponding to a quantity, then word blocks inthe column associated with the quantity column can be identified 514.Next, rows for the table can be identified 516 based on positions of theidentified word blocks. After the rows for the table have beenidentified 516, the table data extraction process 500 can proceed toextract 518 content from each resulting cell within the table. The cellsare defined by the intersection of columns and rows.

On the other hand, when the decision 512 determines that the tableheader element does not correspond to quantity, then the table dataextraction process 500 can perform other processing, such as shown inFIG. 5C. The other processing can proceed with a decision 520 thatdetermines whether the table header element corresponds to a price oramount. Here, the table header element can be “price” or amount” or analias therefor, such as “total.” When the decision 520 determines thatthe table header element does approximately correspond to a price oramount, then word blocks in the column associated with the price/amountcolumn can be identified 524. Then, rows for the table can be identified524 based on positions of the identified word blocks. After the rows forthe table have been identified 524, the table data extraction process500 can proceed to extract 526 content from each resulting cell withinthe table.

On the other hand, when the decision 520 determines that the tableheader element does not approximately correspond to price or amount,then the table data extraction process 500 can perform other processing,such as shown in FIG. 5D. The other processing can proceed by selection528 of a left-most column from the columns identified within the table.Then, the table data extraction process 500 can recognize 530 textwithin the left-most column. Thereafter, bounding word blocks can beformed 532 around each word from the recognized text. Rows for the tablecan then be identified 534 based on positions of the bounding wordblocks. Finally, content from each resulting cell within the table canbe extracted 536. Following the block 518, the table data extractionprocess 500 can end.

FIGS. 6A-6G illustrate an exemplary document during various phases ofdata extraction from a table within the exemplary document, according toone or more embodiment.

FIG. 6A illustrates the exemplary document after object detection. Thedetected objects include detection of a table and a table header. Eachof the table and the table header are denoted in the exemplary documentby a bounding box surrounding the same. The detected table is denoted bya table bounding box 602. The detected table header is denoted by atable header bounding box 604.

FIG. 6B illustrates the exemplary document after table header elementsand columns have been identified. Each of the header elements aredenoted in the exemplary document by a bounding box surrounding thecorresponding header element. Each of the columns are denoted in theexemplary document by a bounding box surrounding the correspondingcolumn. As illustrated in FIG. 6B, for a given column, the bounding boxcan surround not only the corresponding header element but also thecorresponding column. For example, as illustrated in FIG. 6B, a firstrecognized column has a first header element “Quantity” and columnentries “10”, “1”, “1”, “20”, “5”, and “100”. A bounding box 606surrounds the first header element and the first recognized column.

A second recognized column has a second header element “Item No.” andcolumn entries “00234”, “00236”, “01345”, “54302”, “69205”, and “01562”.A bounding box 608 surrounds the second header element and the secondrecognized column.

A third recognized column has a third header element “Item Description”and column entries “16-12 Vinyl Connector”, “Toggle H Duty DPDTOn-Off-On”, “12-10 #10 Vinyl Flange Spade”, “UV Tie Mount 1.1″×1.1″”,“DP Connector 60A/3P/240V”, and “Cable Clip #8 SEU”. A bounding box 610surrounds the third header element and the third recognized column.

A fourth recognized column has a fourth header element “Warehouse” andcolumn entries “W1”, “W1”, “W1”, “W1”, “W1”, and “W1”. A bounding box612 surrounds the fourth header element and the fourth recognizedcolumn.

A fifth recognized column has a fifth header element “Backorder” andcolumn entries “0”, “0”, “0”, “0”, “0”, and “0”. A bounding box 614surrounds the fifth header element and the fifth recognized column.

A sixth recognized column has a sixth header element “Unit Price” andcolumn entries “1.20”, “2.12”, “0.56”, “4.25”, and “1.42”, and “1.00”. Abounding box 616 surrounds the sixth header element and the sixthrecognized column.

A seventh recognized column has a seventh header element “Total” andcolumn entries “12.00”, “2.12”, “0.56”, “85.00”, “7.10”, and “100.00”. Abounding box 618 surrounds the seventh header element and the seventhrecognized column.

FIG. 6C illustrates the exemplary document after columns that have tableheader elements that match an alias type have been identified. Suchtable header elements can be used to identify columns that can be usedas denoting a row anchor pattern. That is, the elements within such acolumn can signal the position of the rows of the table. In thisexample, the columns have been identified are the first column denotedby the bounding box 606, the sixth column denoted by the bounding box616, and the seventh column denoted by the bounding box 618. The firstcolumn has a table header element of “Quantity”, the sixth column has atable header element “Unit Price”, and the seventh column has a tableheader element “Total”. These columns are suitable for selection for usein determining the rows within the table.

FIG. 6D illustrates the exemplary document after the first column(surrounded by the bounding box 606) having the table header element“Quantity” is selected for providing row anchors for the rows of thetable. The column entries “10”, “1”, “1”, “20”, “5”, and “100” are eachlocated and surrounded by individual bounding boxes 620, 622, 624, 626,628 and 630, respectively. These bounding boxes 620-630 provide the rowanchor pattern that can be used to determine the number and position ofthe rows within the table.

FIG. 6E illustrates the exemplary document after the sixth column(surrounded by the bounding box 606) having the table header element“Unit Price” is selected for providing row anchors for the rows of thetable. The column entries “1.20”, “2.12”, “0.56”, “4.25”, and “1.42”,and “1.00” are each located and surrounded by individual bounding boxes632, 634, 636, 638, 640 and 642, respectively. These bounding boxes632-642 provide the row anchor pattern that can be used to determine thenumber and position of the rows within the table.

FIG. 6F illustrates the exemplary document after the rows have beenidentified using the selected row anchors, such as those denoted in FIG.6E or 6F. A given row can be identified as between the top leftcoordinate of one of the selected row anchors to the top left coordinateof the selected row anchor that is immediately below in the same column.In this example, six rows within the table have been identified,including a first row 644, a second row 646, a third row 648, a fourthrow 650 and a fifth row 652.

FIG. 6G illustrates the exemplary document after the cells for the tablehave been identified. A cell is at the intersection of the identifiedrows and columns. This can, for example, be done by an intersection overunion approach. In this example, there are seven columns and six rowswithin the table, thus fourth-two cell have been identified.

Additional details on detecting objects can be found in DETECTION OFUSER INTERFACE CONTROLS VIA INVARIANCE GUIDED SUB-CONTROL LEARNING, U.S.application Ser. No. 16/876,530, filed May 18, 2020, which is herebyincorporated by reference herein.

The various aspects disclosed herein can be utilized with or by roboticprocess automation systems. Exemplary robotic process automation systemsand operations thereof are detailed below.

FIG. 7 is a block diagram of a robotic process automation (RPA) system700 according to one embodiment. The RPA system 700 includes datastorage 702. The data storage 702 can store a plurality of softwarerobots 704, also referred to as bots (e.g., Bot 1, Bot 2, . . . , Bot n,where n is an integer). The software robots 704 can be operable tointeract at a user level with one or more user level applicationprograms (not shown). As used herein, the term “bot” is generallysynonymous with the term software robot. In certain contexts, as will beapparent to those skilled in the art in view of the present disclosure,the term “bot runner” refers to a device (virtual or physical), havingthe necessary software capability (such as bot player 726), on which abot will execute or is executing. The data storage 702 can also stores aplurality of work items 706. Each work item 706 can pertain toprocessing executed by one or more of the software robots 704.

The RPA system 700 can also include a control room 708. The control room708 is operatively coupled to the data storage 702 and is configured toexecute instructions that, when executed, cause the RPA system 700 torespond to a request from a client device 710 that is issued by a user712.1. The control room 708 can act as a server to provide to the clientdevice 710 the capability to perform an automation task to process awork item from the plurality of work items 706. The RPA system 700 isable to support multiple client devices 710 concurrently, each of whichwill have one or more corresponding user session(s) 718, which providesa context. The context can, for example, include security, permissions,audit trails, etc. to define the permissions and roles for botsoperating under the user session 718. For example, a bot executing undera user session, cannot access any files or use any applications that theuser, under whose credentials the bot is operating, does not havepermission to do so. This prevents any inadvertent or malicious actsfrom a bot under which bot 704 executes.

The control room 708 can provide, to the client device 710, softwarecode to implement a node manager 714. The node manager 714 executes onthe client device 710 and provides a user 712 a visual interface viabrowser 713 to view progress of and to control execution of automationtasks. It should be noted that the node manager 714 can be provided tothe client device 710 on demand, when required by the client device 710,to execute a desired automation task. In one embodiment, the nodemanager 714 may remain on the client device 710 after completion of therequested automation task to avoid the need to download it again. Inanother embodiment, the node manager 714 may be deleted from the clientdevice 710 after completion of the requested automation task. The nodemanager 714 can also maintain a connection to the control room 708 toinform the control room 708 that device 710 is available for service bythe control room 708, irrespective of whether a live user session 718exists. When executing a bot 704, the node manager 714 can impersonatethe user 712 by employing credentials associated with the user 712.

The control room 708 initiates, on the client device 710, a user session718 (seen as a specific instantiation 718.1) to perform the automationtask. The control room 708 retrieves the set of task processinginstructions 704 that correspond to the work item 706. The taskprocessing instructions 704 that correspond to the work item 706 canexecute under control of the user session 718.1, on the client device710. The node manager 714 can provide update data indicative of statusof processing of the work item to the control room 708. The control room708 can terminate the user session 718.1 upon completion of processingof the work item 706. The user session 718.1 is shown in further detailat 719, where an instance 724.1 of user session manager 724 is seenalong with a bot player 726, proxy service 728, and one or more virtualmachine(s) 730, such as a virtual machine that runs Java® or Python®.The user session manager 724 provides a generic user session contextwithin which a bot 704 executes.

The bots 704 execute on a bot player, via a computing device, to performthe functions encoded by the bot. Some or all of the bots 704 may, incertain embodiments, be located remotely from the control room 708.Moreover, the devices 710 and 711, which may be conventional computingdevices, such as for example, personal computers, server computers,laptops, tablets and other portable computing devices, may also belocated remotely from the control room 708. The devices 710 and 711 mayalso take the form of virtual computing devices. The bots 704 and thework items 706 are shown in separate containers for purposes ofillustration but they may be stored in separate or the same device(s),or across multiple devices. The control room 708 can perform usermanagement functions, source control of the bots 704, along withproviding a dashboard that provides analytics and results of the bots704, performs license management of software required by the bots 704and manages overall execution and management of scripts, clients, roles,credentials, security, etc. The major functions performed by the controlroom 708 can include: (i) a dashboard that provides a summary ofregistered/active users, tasks status, repository details, number ofclients connected, number of scripts passed or failed recently, tasksthat are scheduled to be executed and those that are in progress, andany other desired information; (ii) user/role management—permitscreation of different roles, such as bot creator, bot runner, admin, andcustom roles, and activation, deactivation and modification of roles;(iii) repository management—to manage all scripts, tasks, workflows andreports etc.; (iv) operations management—permits checking status oftasks in progress and history of all tasks, and permits theadministrator to stop/start execution of bots currently executing; (v)audit trail—logs creation of all actions performed in the control room;(vi) task scheduler—permits scheduling tasks which need to be executedon different clients at any particular time; (vii) credentialmanagement—permits password management; and (viii) security:management—permits rights management for all user roles. The controlroom 708 is shown generally for simplicity of explanation. Multipleinstances of the control room 708 may be employed where large numbers ofbots are deployed to provide for scalability of the RPA system 700.

In the event that a device, such as device 711 (e.g., operated by user712.2) does not satisfy the minimum processing capability to run a nodemanager 714, the control room 708 can make use of another device, suchas device 715, that has the requisite capability. In such case, a nodemanager 714 within a Virtual Machine (VM), seen as VM 716, can beresident on the device 715. The node manager 714 operating on the device715 can communicate with browser 713 on device 711. This approachpermits RPA system 700 to operate with devices that may have lowerprocessing capability, such as older laptops, desktops, andportable/mobile devices such as tablets and mobile phones. In certainembodiments the browser 713 may take the form of a mobile applicationstored on the device 711. The control room 708 can establish a usersession 718.2 for the user 712.2 while interacting with the control room708 and the corresponding user session 718.2 operates as described abovefor user session 718.1 with user session manager 724 operating on device710 as discussed above.

In certain embodiments, the user session manager 724 provides fivefunctions. First is a health service 738 that maintains and provides adetailed logging of bot execution including monitoring memory and CPUusage by the bot and other parameters such as number of file handlesemployed. The bots 704 can employ the health service 738 as a resourceto pass logging information to the control room 708. Execution of thebot is separately monitored by the user session manager 724 to trackmemory, CPU, and other system information. The second function providedby the user session manager 724 is a message queue 740 for exchange ofdata between bots executed within the same user session 718. The thirdfunction is a deployment service (also referred to as a deploymentmodule) 742 that connects to the control room 708 to request executionof a requested bot 704. The deployment service 742 can also ensure thatthe environment is ready for bot execution, such as by making availabledependent libraries. The fourth function is a bot launcher 744 which canread metadata associated with a requested bot 704 and launch anappropriate container and begin execution of the requested bot. Thefifth function is a debugger service 746 that can be used to debug botcode.

The bot player 726 can execute, or play back, a sequence of instructionsencoded in a bot. The sequence of instructions can, for example, becaptured by way of a recorder when a human performs those actions, oralternatively the instructions are explicitly coded into the bot. Theseinstructions enable the bot player 726, to perform the same actions as ahuman would do in their absence. In one implementation, the instructionscan compose of a command (or action) followed by set of parameters. Forexample, Open Browser is a command and a URL would be the parameter forit to launch a web resource. Proxy service 728 can enable integration ofexternal software or applications with the bot to provide specializedservices. For example, an externally hosted artificial intelligencesystem can enable the bot to understand the meaning of a “sentence.”

The user 712.1 can interact with node manager 714 via a conventionalbrowser 713 which employs the node manager 714 to communicate with thecontrol room 708. When the user 712.1 logs in from the client device 710to the control room 708 for the first time, the user 712.1 can beprompted to download and install the node manager 714 on the device 710,if one is not already present. The node manager 714 can establish a websocket connection to the user session manager 724, deployed by thecontrol room 708 that lets the user 712.1 subsequently create, edit, anddeploy the bots 704.

FIG. 8 is a block diagram of a generalized runtime environment for bots704 in accordance with another embodiment of the RPA system 700illustrated in FIG. 7. This flexible runtime environment advantageouslypermits extensibility of the platform to enable use of various languagesin encoding bots. In the embodiment of FIG. 8, RPA system 700 generallyoperates in the manner described in connection with FIG. 7, except thatin the embodiment of FIG. 8, some or all of the user sessions 718execute within a virtual machine 716. This permits the bots 704 tooperate on an RPA system 700 that runs on an operating system differentfrom an operating system on which a bot 704 may have been developed. Forexample, if a bot 704 is developed on the Windows® operating system, theplatform agnostic embodiment shown in FIG. 8 permits the bot 704 to beexecuted on a device 852 or 854 executing an operating system 853 or 855different than Windows®, such as, for example, Linux. In one embodiment,the VM 716 takes the form of a Java Virtual Machine (JVM) as provided byOracle Corporation. As will be understood by those skilled in the art inview of the present disclosure, a JVM enables a computer to run Java®programs as well as programs written in other languages that are alsocompiled to Java® bytecode.

In the embodiment shown in FIG. 8, multiple devices 852 can executeoperating system 1, 853, which may, for example, be a Windows® operatingsystem. Multiple devices 854 can execute operating system 2, 855, whichmay, for example, be a Linux® operating system. For simplicity ofexplanation, two different operating systems are shown, by way ofexample and additional operating systems such as the macOS®, or otheroperating systems may also be employed on devices 852, 854 or otherdevices. Each device 852, 854 has installed therein one or more VM's716, each of which can execute its own operating system (not shown),which may be the same or different than the host operating system853/855. Each VM 716 has installed, either in advance, or on demand fromcontrol room 708, a node manager 714. The embodiment illustrated in FIG.8 differs from the embodiment shown in FIG. 7 in that the devices 852and 854 have installed thereon one or more VMs 716 as described above,with each VM 716 having an operating system installed that may or maynot be compatible with an operating system required by an automationtask. Moreover, each VM has installed thereon a runtime environment 856,each of which has installed thereon one or more interpreters (shown asinterpreter 1, interpreter 2, interpreter 3). Three interpreters areshown by way of example but any run time environment 856 may, at anygiven time, have installed thereupon less than or more than threedifferent interpreters. Each interpreter 856 is specifically encoded tointerpret instructions encoded in a particular programming language. Forexample, interpreter 1 may be encoded to interpret software programsencoded in the Java® programming language, seen in FIG. 8 as language 1in Bot 1 and Bot 2. Interpreter 2 may be encoded to interpret softwareprograms encoded in the Python® programming language, seen in FIG. 8 aslanguage 2 in Bot 1 and Bot 2, and interpreter 3 may be encoded tointerpret software programs encoded in the R programming language, seenin FIG. 8 as language 3 in Bot 1 and Bot 2.

Turning to the bots Bot 1 and Bot 2, each bot may contain instructionsencoded in one or more programming languages. In the example shown inFIG. 8, each bot can contain instructions in three different programminglanguages, for example, Java®, Python® and R. This is for purposes ofexplanation and the embodiment of FIG. 8 may be able to create andexecute bots encoded in more or less than three programming languages.The VMs 716 and the runtime environments 856 permit execution of botsencoded in multiple languages, thereby permitting greater flexibility inencoding bots. Moreover, the VMs 716 permit greater flexibility in botexecution. For example, a bot that is encoded with commands that arespecific to an operating system, for example, open a file, or thatrequires an application that runs on a particular operating system, forexample, Excel® on Windows®, can be deployed with much greaterflexibility. In such a situation, the control room 708 will select adevice with a VM 716 that has the Windows® operating system and theExcel® application installed thereon. Licensing fees can also be reducedby serially using a particular device with the required licensedoperating system and application(s), instead of having multiple deviceswith such an operating system and applications, which may be unused forlarge periods of time.

FIG. 9 illustrates a block diagram of yet another embodiment of the RPAsystem 700 of FIG. 7 configured to provide platform independent sets oftask processing instructions for bots 704. Two bots 704, bot 1 and bot 2are shown in FIG. 9. Each of bots 1 and 2 are formed from one or morecommands 901, each of which specifies a user level operation with aspecified application program, or a user level operation provided by anoperating system. Sets of commands 906.1 and 906.2 may be generated bybot editor 902 and bot recorder 904, respectively, to define sequencesof application level operations that are normally performed by a humanuser. The bot editor 902 may be configured to combine sequences ofcommands 901 via an editor. The bot recorder 904 may be configured torecord application level operations performed by a user and to convertthe operations performed by the user to commands 901. The sets ofcommands 906.1 and 906.2 generated by the editor 902 and the recorder904 can include command(s) and schema for the command(s), where theschema defines the format of the command(s). The format of a commandcan, such as, includes the input(s) expected by the command and theirformat. For example, a command to open a URL might include the URL, auser login, and a password to login to an application resident at thedesignated URL.

The control room 708 operates to compile, via compiler 908, the sets ofcommands generated by the editor 902 or the recorder 904 into platformindependent executables, each of which is also referred to herein as abot JAR (Java ARchive) that perform application level operationscaptured by the bot editor 902 and the bot recorder 904. In theembodiment illustrated in FIG. 9, the set of commands 906, representinga bot file, can be captured in a JSON (JavaScript Object Notation)format which is a lightweight data-interchange text-based format. JSONis based on a subset of the JavaScript Programming Language StandardECMA-262 3rd Edition—December 1999. JSON is built on two structures: (i)a collection of name/value pairs; in various languages, this is realizedas an object, record, struct, dictionary, hash table, keyed list, orassociative array, (ii) an ordered list of values which, in mostlanguages, is realized as an array, vector, list, or sequence. Bots 1and 2 may be executed on devices 710 and/or 715 to perform the encodedapplication level operations that are normally performed by a humanuser.

FIG. 10 is a block diagram illustrating details of one embodiment of thebot compiler 908 illustrated in FIG. 9. The bot compiler 908 accessesone or more of the bots 704 from the data storage 702, which can serveas bot repository, along with commands 901 that are contained in acommand repository 1032. The bot compiler 708 can also access compilerdependency repository 1034. The bot compiler 708 can operate to converteach command 901 via code generator module 910 to an operating systemindependent format, such as a Java command. The bot compiler 708 thencompiles each operating system independent format command into bytecode, such as Java byte code, to create a bot JAR. The convert commandto Java module 910 is shown in further detail in in FIG. 10 by JARgenerator 1028 of a build manager 1026. The compiling to generate Javabyte code module 912 can be provided by the JAR generator 1028. In oneembodiment, a conventional Java compiler, such as javac from OracleCorporation, may be employed to generate the bot JAR (artifacts). Aswill be appreciated by those skilled in the art, an artifact in a Javaenvironment includes compiled code along with other dependencies andresources required by the compiled code. Such dependencies can includelibraries specified in the code and other artifacts. Resources caninclude web pages, images, descriptor files, other files, directoriesand archives.

As noted in connection with FIG. 9, deployment service 742 can beresponsible to trigger the process of bot compilation and then once abot has compiled successfully, to execute the resulting bot JAR onselected devices 710 and/or 715. The bot compiler 908 can comprises anumber of functional modules that, when combined, generate a bot 704 ina JAR format. A bot reader 1002 loads a bot file into memory with classrepresentation. The bot reader 1002 takes as input a bot file andgenerates an in-memory bot structure. A bot dependency generator 1004identifies and creates a dependency graph for a given bot. It includesany child bot, resource file like script, and document or image usedwhile creating a bot. The bot dependency generator 1004 takes, as input,the output of the bot reader 1002 and provides, as output, a list ofdirect and transitive bot dependencies. A script handler 1006 handlesscript execution by injecting a contract into a user script file. Thescript handler 1006 registers an external script in manifest and bundlesthe script as a resource in an output JAR. The script handler 1006takes, as input, the output of the bot reader 1002 and provides, asoutput, a list of function pointers to execute different types ofidentified scripts like Python, Java, VB scripts.

An entry class generator 1008 can create a Java class with an entrymethod, to permit bot execution to be started from that point. Forexample, the entry class generator 1008 takes, as an input, a parent botname, such “Invoice-processing.bot” and generates a Java class having acontract method with a predefined signature. A bot class generator 1010can generate a bot class and orders command code in sequence ofexecution. The bot class generator 1010 can take, as input, an in-memorybot structure and generates, as output, a Java class in a predefinedstructure. A Command/Iterator/Conditional Code Generator 1012 wires up acommand class with singleton object creation, manages nested commandlinking, iterator (loop) generation, and conditional (If/Else If/Else)construct generation. The Command/Iterator/Conditional Code Generator1012 can take, as input, an in-memory bot structure in JSON format andgenerates Java code within the bot class. A variable code generator 1014generates code for user defined variables in the bot, maps bot leveldata types to Java language compatible types, and assigns initial valuesprovided by user. The variable code generator 1014 takes, as input, anin-memory bot structure and generates Java code within the bot class. Aschema validator 1016 can validate user inputs based on command schemaand includes syntax and semantic checks on user provided values. Theschema validator 1016 can take, as input, an in-memory bot structure andgenerates validation errors that it detects. The attribute codegenerator 1018 can generate attribute code, handles the nested nature ofattributes, and transforms bot value types to Java language compatibletypes. The attribute code generator 1018 takes, as input, an in-memorybot structure and generates Java code within the bot class. A utilityclasses generator 1020 can generate utility classes which are used by anentry class or bot class methods. The utility classes generator 1020 cangenerate, as output, Java classes. A data type generator 1022 cangenerate value types useful at runtime. The data type generator 1022 cangenerate, as output, Java classes. An expression generator 1024 canevaluate user inputs and generates compatible Java code, identifiescomplex variable mixed user inputs, inject variable values, andtransform mathematical expressions. The expression generator 1024 cantake, as input, user defined values and generates, as output, Javacompatible expressions.

The JAR generator 1028 can compile Java source files, produces byte codeand packs everything in a single JAR, including other child bots andfile dependencies. The JAR generator 1028 can take, as input, generatedJava files, resource files used during the bot creation, bot compilerdependencies, and command packages, and then can generate a JAR artifactas an output. The JAR cache manager 1030 can put a bot JAR in cacherepository so that recompilation can be avoided if the bot has not beenmodified since the last cache entry. The JAR cache manager 1030 cantake, as input, a bot JAR.

In one or more embodiment described herein command action logic can beimplemented by commands 901 available at the control room 708. Thispermits the execution environment on a device 710 and/or 715, such asexists in a user session 718, to be agnostic to changes in the commandaction logic implemented by a bot 704. In other words, the manner inwhich a command implemented by a bot 704 operates need not be visible tothe execution environment in which a bot 704 operates. The executionenvironment is able to be independent of the command action logic of anycommands implemented by bots 704. The result is that changes in anycommands 901 supported by the RPA system 700, or addition of newcommands 901 to the RPA system 700, do not require an update of theexecution environment on devices 710, 715. This avoids what can be atime and resource intensive process in which addition of a new command901 or change to any command 901 requires an update to the executionenvironment to each device 710, 715 employed in a RPA system. Take, forexample, a bot that employs a command 901 that logs into an on-onlineservice. The command 901 upon execution takes a Uniform Resource Locator(URL), opens (or selects) a browser, retrieves credentials correspondingto a user on behalf of whom the bot is logging in as, and enters theuser credentials (e.g. username and password) as specified. If thecommand 901 is changed, for example, to perform two-factorauthentication, then it will require an additional resource (the secondfactor for authentication) and will perform additional actions beyondthose performed by the original command (for example, logging into anemail account to retrieve the second factor and entering the secondfactor). The command action logic will have changed as the bot isrequired to perform the additional changes. Any bot(s) that employ thechanged command will need to be recompiled to generate a new bot JAR foreach changed bot and the new bot JAR will need to be provided to a botrunner upon request by the bot runner. The execution environment on thedevice that is requesting the updated bot will not need to be updated asthe command action logic of the changed command is reflected in the newbot JAR containing the byte code to be executed by the executionenvironment.

The embodiments herein can be implemented in the general context ofcomputer-executable instructions, such as those included in programmodules, being executed in a computing system on a target, real orvirtual, processor. Generally, program modules include routines,programs, libraries, objects, classes, components, data structures, etc.that perform particular tasks or implement particular abstract datatypes. The program modules may be obtained from another computer system,such as via the Internet, by downloading the program modules from theother computer system for execution on one or more different computersystems. The functionality of the program modules may be combined orsplit between program modules as desired in various embodiments.Computer-executable instructions for program modules may be executedwithin a local or distributed computing system. The computer-executableinstructions, which may include data, instructions, and configurationparameters, may be provided via an article of manufacture including acomputer readable medium, which provides content that representsinstructions that can be executed. A computer readable medium may alsoinclude a storage or database from which content can be downloaded. Acomputer readable medium may further include a device or product havingcontent stored thereon at a time of sale or delivery. Thus, delivering adevice with stored content, or offering content for download over acommunication medium, may be understood as providing an article ofmanufacture with such content described herein.

FIG. 11 illustrates a block diagram of an exemplary computingenvironment 1100 for an implementation of an RPA system, such as the RPAsystems disclosed herein. The embodiments described herein may beimplemented using the exemplary computing environment 1100. Theexemplary computing environment 1100 includes one or more processingunits 1102, 1104 and memory 1106, 1108. The processing units 1102, 1106execute computer-executable instructions. Each of the processing units1102, 1106 can be a general-purpose central processing unit (CPU),processor in an application-specific integrated circuit (ASIC) or anyother type of processor. For example, as shown in FIG. 11, theprocessing unit 1102 can be a CPU, and the processing unit can be agraphics/co-processing unit (GPU). The tangible memory 1106, 1108 may bevolatile memory (e.g., registers, cache, RAM), non-volatile memory(e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two,accessible by the processing unit(s). The hardware components may bestandard hardware components, or alternatively, some embodiments mayemploy specialized hardware components to further increase the operatingefficiency and speed with which the RPA system operates. The variouscomponents of exemplary computing environment 1100 may be rearranged invarious embodiments, and some embodiments may not require nor includeall of the above components, while other embodiments may includeadditional components, such as specialized processors and additionalmemory.

The exemplary computing environment 1100 may have additional featuressuch as, for example, tangible storage 1110, one or more input devices1114, one or more output devices 1112, and one or more communicationconnections 1116. An interconnection mechanism (not shown) such as abus, controller, or network can interconnect the various components ofthe exemplary computing environment 1100. Typically, operating systemsoftware (not shown) provides an operating system for other softwareexecuting in the exemplary computing environment 1100, and coordinatesactivities of the various components of the exemplary computingenvironment 1100.

The tangible storage 1110 may be removable or non-removable, andincludes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, orany other medium which can be used to store information in anon-transitory way, and which can be accessed within the computingsystem 1100. The tangible storage 1110 can store instructions for thesoftware implementing one or more features of a PRA system as describedherein.

The input device(s) or image capture device(s) 1114 may include, forexample, one or more of a touch input device such as a keyboard, mouse,pen, or trackball, a voice input device, a scanning device, an imagingsensor, touch surface, or any other device capable of providing input tothe exemplary computing environment 1100. For multimedia embodiment, theinput device(s) 1114 can, for example, include a camera, a video card, aTV tuner card, or similar device that accepts video input in analog ordigital form, a microphone, an audio card, or a CD-ROM or CD-RW thatreads audio/video samples into the exemplary computing environment 1100.The output device(s) 1112 can, for example, include a display, aprinter, a speaker, a CD-writer, or any another device that providesoutput from the exemplary computing environment 1100.

The one or more communication connections 1116 can enable communicationover a communication medium to another computing entity. Thecommunication medium conveys information such as computer-executableinstructions, audio or video input or output, or other data. Thecommunication medium can include a wireless medium, a wired medium, or acombination thereof.

The various aspects, features, embodiments or implementations of theinvention described above can be used alone or in various combinations.

Embodiments of the invention can, for example, be implemented bysoftware, hardware, or a combination of hardware and software.Embodiments of the invention can also be embodied as computer readablecode on a computer readable medium. In one embodiment, the computerreadable medium is non-transitory. The computer readable medium is anydata storage device that can store data which can thereafter be read bya computer system. Examples of the computer readable medium generallyinclude read-only memory and random-access memory. More specificexamples of computer readable medium are tangible and include Flashmemory, EEPROM memory, memory card, CD-ROM, DVD, hard drive, magnetictape, and optical data storage device. The computer readable medium canalso be distributed over network-coupled computer systems so that thecomputer readable code is stored and executed in a distributed fashion.

Numerous specific details are set forth in order to provide a thoroughunderstanding of the present invention. However, it will become obviousto those skilled in the art that the invention may be practiced withoutthese specific details. The description and representation herein arethe common meanings used by those experienced or skilled in the art tomost effectively convey the substance of their work to others skilled inthe art. In other instances, well-known methods, procedures, components,and circuitry have not been described in detail to avoid unnecessarilyobscuring aspects of the present invention.

In the foregoing description, reference to “one embodiment” or “anembodiment” means that a particular feature, structure, orcharacteristic described in connection with the embodiment can beincluded in at least one embodiment of the invention. The appearances ofthe phrase “in one embodiment” in various places in the specificationare not necessarily all referring to the same embodiment, nor areseparate or alternative embodiments mutually exclusive of otherembodiments. Further, the order of blocks in process flowcharts ordiagrams representing one or more embodiments of the invention do notinherently indicate any particular order nor imply any limitations inthe invention.

The many features and advantages of the present invention are apparentfrom the written description. Further, since numerous modifications andchanges will readily occur to those skilled in the art, the inventionshould not be limited to the exact construction and operation asillustrated and described. Hence, all suitable modifications andequivalents may be resorted to as falling within the scope of theinvention.

What is claimed is:
 1. A computer-implemented method for extracting datafrom an image of a document, the computer-implemented method comprising:detecting a plurality of objects that are detected in the image of thedocument, the plurality of objects detected including at least a table,table header and table header elements from the detected object;identifying columns within the table; selecting at least one of thecolumns; identifying rows within the table based on positions of dataitems within the selected column; identifying cells within the table atthe intersection of the identified rows and the identified columns; andextracting content from the identified cells.
 2. A computer-implementedmethod as recited in claim 1, wherein the identifying identifies thecolumns within the table based on the table header elements.
 3. Acomputer-implemented method as recited in claim 1, wherein the objectdata denotes at least a portion of the document having the detectedobject.
 4. A computer-implemented method as recited in claim 1, whereinthe object data denotes at least an object type for the detected object,and wherein the object type denotes a table.
 5. A computer-implementedmethod as recited in claim 1, wherein the object data retrieved by theretrieving is retrieved using at least in part a Natural LanguageProcessing Model trained for data extraction of data from object blockspertaining to tables.
 6. A computer-implemented method for extractingdata from an image of a document, the computer-implemented methodcomprising: receiving the image for the document; detecting objectswithin the document image; identifying a table, table header and tableheader elements from the detected objects; identifying columns withinthe table; selecting at least one of the columns; identifying rowswithin the table based on positions of data items within the selectedcolumn; identifying cells within the table at the intersection of theidentified rows and the identified columns; and extracting content fromthe identified cells.
 7. A computer-implemented method as recited inclaim 6, wherein the identifying rows within the table comprises:identifying the data items in the selected column; and drawing boundingboxes around the data items within the selected column; and identifyingrows within the table based on positions of bounding boxes around thedata items within the selected column.
 8. A computer-implemented methodas recited in claim 6, wherein the data items are word blocks.
 9. Acomputer-implemented method as recited in claim 6, wherein the selectedcolumn is selected based on the table header element associated with theselected column.
 10. A computer-implemented method as recited in claim6, wherein the table header element for the selected column pertains toquantity or a substitute descriptor therefor.
 11. A computer-implementedmethod as recited in claim 6, wherein the table header element for theselected column pertains to price/amount or a substitute descriptortherefor.
 12. A computer-implemented method as recited in claim 6,wherein the selected column is a left-most column in the table.
 13. Acomputer-implemented method as recited in claim 6, wherein the selectingof the selected column comprises: determining whether one or more of thetable header elements are same or equivalent to one or more of a set ofpredetermined descriptors; and selecting the selected column based onthe one or more of the table header elements that are determined to bethe same or equivalent to one or more of the set of predetermineddescriptors.
 14. A non-transitory computer readable medium including atleast computer program code tangibly stored thereon for extracting datafrom an image of a document, the computer readable medium comprising:computer program code for detecting a plurality of objects that aredetected in the image of the document, the plurality of objects detectedincluding at least a table, table header and table header elements fromthe detected object; computer program code for identifying columnswithin the table; computer program code for selecting at least one ofthe columns; computer program code for identifying rows within the tablebased on positions of data items within the selected column; computerprogram code for identifying cells within the table at the intersectionof the identified rows and the identified columns; and computer programcode for extracting content from the identified cells.
 15. Anon-transitory computer readable medium as recited in claim 14, whereinthe computer program code for identifying identifies the columns withinthe table based on the table header elements.
 16. A non-transitorycomputer readable medium as recited in claim 14, wherein the object datadenotes at least a portion of the document having the detected object.17. A non-transitory computer readable medium as recited in claim 15,wherein the object data denotes at least an object type for the detectedobject, and wherein the object type denotes a table.
 18. Anon-transitory computer readable medium as recited in claim 17, whereinthe computer program code for identifying columns identifies the columnswithin the table based on the table header elements.
 19. Anon-transitory computer readable medium including at least computerprogram code tangibly stored thereon for extracting data from an imageof a document, the computer readable medium comprising: computer programcode for identifying a table and table header elements from objectsdetected within the image of the document, the table header elements arepart of a table header for the table; computer program code foridentifying columns within the table; computer program code forselecting at least one of the columns; computer program code foridentifying rows within the table based on positions of data itemswithin the selected column; computer program code for identifying cellswithin the table at the intersection of the identified rows and theidentified columns; and computer program code for extracting contentfrom the identified cells.
 20. A non-transitory computer readable mediumas recited in claim 19, wherein the computer program code foridentifying rows within the table comprises: computer program code foridentifying the data items in the selected column; and computer programcode for drawing bounding boxes around the data items within theselected column; and computer program code for identifying rows withinthe table based on positions of bounding boxes around the data itemswithin the selected column.
 21. A non-transitory computer readablemedium as recited in claim 20, wherein the data items are word blocks.22. A non-transitory computer readable medium as recited in claim 19,wherein the selected column is selected based on the table headerelement associated with the selected column.
 23. A non-transitorycomputer readable medium as recited in claim 22, wherein the tableheader element for the selected column pertains to quantity or asubstitute descriptor therefor.
 24. A non-transitory computer readablemedium as recited in claim 22, wherein the table header element for theselected column pertains to price/amount or a substitute descriptortherefor.
 25. A non-transitory computer readable medium as recited inany of claim 19, wherein the selected column is a left-most column inthe table.
 26. A non-transitory computer readable medium as recited inclaim 19, wherein the computer program code for selecting of theselected column comprises: computer program code for determining whetherone or more of the table header elements are same or equivalent to oneor more of a set of predetermined descriptors; and computer program codefor selecting the selected column based on the one or more of the tableheader elements that are determined to be the same or equivalent to oneor more of the set of predetermined descriptors.